Large vocabulary continuous speech recognition for Vietnamese, an under-resourced language
نویسندگان
چکیده
This paper proposes a method to build a Vietnamese Large Vocabulary Continuous Speech Recognition system (Vietnamese LVCSR system). The difference between Vietnamese and European languages is analyzed and used to adapt a LVCSR system for European languages to Vietnamese. Experiments are implemented on the VNSPEECHCORPUS. The results show that the accuracy of Vietnamese recognition system is increased by using Vietnamese language characteristics.
منابع مشابه
Large Vocabulary Continuous Speech Recognition for Vietnamese, a Under-resourced Language
This paper proposes a method to build a Vietnamese Large Vocabulary Continuous Speech Recognition system (Vietnamese LVCSR system). The difference between Vietnamese and European languages is analyzed and used to adapt a LVCSR system for European languages to Vietnamese. Experiments are implemented on the VNSPEECHCORPUS. The results show that the accuracy of Vietnamese recognition system is inc...
متن کاملFirst steps in building a large vocabulary continuous speech recognition system for Vietnamese
This paper presents an overview of our activities for building a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Vietnamese implemented at CLIPS-IMAG Laboratory (France) and International Research Center MICA (Vietnam). Firstly, a new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. Secondly, the first resul...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملA novel approach in continuous speech recognition for Vietnamese, an isolating tonal language
This paper proposes a new approach for the integration of the Vietnamese language characteristics into a Large Vocabulary Continuous Speech Recognition System (LVCSR) which was built for some European languages. Firstly, a new module of tone recognition using Hidden Markov model was constructed. Secondly, several methods were applied to transform a text corpus of monosyllabic words into text co...
متن کاملSMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian
This study investigates the possibility of using statistical machine translation to create domainspecific language resources. We propose a methodology that aims to create a domain-specific automatic speech recognition (ASR) system for a low-resourced language when in-domain text corpora are available only in a high-resourced language. Several translation scenarios (both unsupervised and semi-su...
متن کامل